PPM compression without escapes

نویسنده

  • Peter M. Fenwick
چکیده

A significant cost in PPM data compression (and often the major cost) is the provision and efficient coding of escapes while building contexts. This paper presents some recent work on eliminating escapes in PPM compression, using bit-wise compression with binary contexts. It shows that PPM without escapes can achieve averages of 2.5 bits per character on the Calgary Corpus and 2.2 bpc on the Canterbury Corpus, both values comparing well with accepted good compressors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Native Language Identification with PPM

This paper reports on our work in the NLI shared task 2013 on Native Language Identification. The task is to automatically detect the native language of the TOEFL essays authors in a set of given test documents in English. The task was solved by a system that used the PPM compression algorithm based on an n-gram statistical model. We submitted four runs; word-based PPMC algorithm with normaliza...

متن کامل

Ensemble Prediction by Partial Matching

Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPM-Ens which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm is evaluated on the Calgary corpus. The results indicate that combining multiple contexts leads ...

متن کامل

PPMexe: PPM for Code Compression

With the emergence of software delivery platforms such as Microsoft’s .NET, code compression has become one of the core enabling technologies strongly affecting system performance. In this paper, we present compression mechanisms for executables that explore their syntax and semantics to achieve superior compression rates. The fundament of our compression codec is the generic paradigm of predic...

متن کامل

Generic Adaptive Syntax-Directed Compression for Mobile Code

We propose a new scheme for compressing mobile programs. Our proposal is meant as part of a larger infrastructure for code distribution and deployment. In this paper we show how to effectively compress programs on the source level by compressing abstract syntax trees (ASTs) which are equivalent to source code (modulo comments and layout). We compress ASTs by adapting the wellknown PPM (predicti...

متن کامل

LIPT: A Reversible Lossless Text Transform to Improve Compression Performance

Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. We propose an alternative approach in this paper to develop a reversible transformation that can be applied to a source ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2012